Genome evolution of a nonparasitic secondary heterotroph, the diatom Nitzschia putrida

Secondary loss of photosynthesis is observed across almost all plastid-bearing branches of the eukaryotic tree of life. However, genome-based insights into the transition from a phototroph into a secondary heterotroph have so far only been revealed for parasitic species. Free-living organisms can yield unique insights into the evolutionary consequence of the loss of photosynthesis, as the parasitic lifestyle requires specific adaptations to host environments. Here, we report on the diploid genome of the free-living diatom Nitzschia putrida (35 Mbp), a nonphotosynthetic osmotroph whose photosynthetic relatives contribute ca. 40% of net oceanic primary production. Comparative analyses with photosynthetic diatoms and heterotrophic algae with parasitic lifestyle revealed that a combination of gene loss, the accumulation of genes involved in organic carbon degradation, a unique secretome, and the rapid divergence of conserved gene families involved in cell wall and extracellular metabolism appear to have facilitated the lifestyle of a free-living secondary heterotroph.


INTRODUCTION
The loss of photosynthesis in photoautotrophs is successful if compensated by a competitive advantage arising from the availability of an extracellular energy source. Hence, many secondary heterotrophs evolve as parasites (1)(2)(3), relying on sufficient resources provided by their hosts. Well-studied examples are the Apicomplexa [e.g., (4)], which have lost photosynthesis secondarily. However, the loss of photosynthesis can also lead to free-living secondary heterotrophs, which are as common as parasites (2,(5)(6)(7)(8). Despite their significance, our knowledge about the evolution of free-living secondary heterotrophs is very limited, and we therefore lack insights into evolutionary processes required for them to thrive without photosynthesis and independently of a resource-providing host. Given that a parasitic lifestyle accelerates the rate of evolution (cf. Red Queen hypothesis) (9) and of loss of conserved orthologous genes [e.g., (10)], the genome analysis of a nonparasitic secondary heterotroph can provide insights uncompromised by parasite-specific adaptations. Hence, the diatom Nitzschia putrida, isolated from mangrove estuaries, is the ideal model to test these hypotheses because it is an example of a free-living secondary heterotroph (5,11) within the diverse group of largely photoautotrophic diatoms (12,13). As several genomes of the latter have recently become available including close phylogenetic relatives (14)(15)(16), a genome-based comparative metabolic reconstruction of N. putrida promises to reveal fresh insights into what is required to thrive as a free-living secondary heterotroph. Thus, here, we have analyzed the draft genome sequence of N. putrida, which provides insight into evolutionary processes underpinning lifestyle shifts from photoautotrophy to free-living heterotrophy in the context of a coastal surface ocean ecosystem. this species. On the basis of this assembly, we estimated a genome size of 35 Mbp, including 87 scaffolds with an N50 of 860.9 kbp. The longest scaffold was 3.8 Mbp. The heterozygous regions of the genome (alternate contigs) estimated by the Falcon assembler resulted in 12 Mbp, with an N50 of 121 kbp (table S1). The Falcon assembly was error-corrected and polished by approximately 150-fold coverage of Illumina short reads, which were subsequently used for generating the final assembly with Pilon 1.2.2 (19) including manual curation.
According to the k-mer-assessed diploid nature of the N. putrida genome, the read coverage of the homozygous regions is approximately twofold higher than the read coverage for the heterozygous regions, suggesting the presence of diverged alleles as previously identified in the genome of the photoautotroph diatom Fragilariopsis cylindrus ( fig. S1, A and B). Thus, most of the diverged allelic variants can be found in the heterozygous regions characterized by the presence of alternate contigs, whereas the regions with no corresponding alternate contigs are homozygous ( fig. S1B). On the basis of the analysis with Braker2 version 2.0.3 (20), the Nitzschia genome comprises 15,003 and 5767 inferred protein-coding loci on the primary and alternate contigs, respectively (table S1). Almost 40% of loci in the genome of N. putrida appear to be characterized by diverged alleles. A BUSCOv3 analysis (21) revealed the genome to be complete at a level of 90.1% based on the haploid set of genes.

The loss of photosynthesis
The haploid set of genes was used to reconstruct the nuclear-encoded plastid proteome of N. putrida and therefore to reveal the extent of gene loss including key genes of photosynthesis. A comparative analysis of the N. putrida plastome (22) with its photosynthetic counterparts revealed that more than 50% of nuclear encoded plastid proteins have been lost (Fig. 1B). More than 500 orthogroups (OrthoFinder) (23) of nuclear-encoded plastid proteins, which are usually shared between photosynthetic diatoms (22), are missing in the predicted plastid proteome of N. putrida (Fig. 1C). The missing part of the plastid proteome included genes encoding for proteins of lightharvesting antenna including fucoxanthin-chlorophyll a/c protein (fcp), photosystem II and I (e.g., psbA, psbC, psbO, psaA, psaB, and psaD), the cytochrome b6/f complex (e.g., petA), and carbon fixation (e.g., rbcS and rbcL) in addition to genes of the Calvin cycle  (22), respectively. Data of two photosynthetic diatoms P. tricornutum and T. pseudonana are derived from the previous study (22). (C) Unique and shared plastid-targeted orthogroups. Highlighted in red is the orthogroup exclusively shared by the two photosynthetic diatoms. (D) Predicted metabolic map of the nonphotosynthetic plastid. Representative pathways found in photosynthetic diatom species are shown. Green and light gray arrows show the presence and absence of the responsible protein sequences for the reactions in the genome, respectively. Amino acids are highlighted in red. Abbreviations are described in the Supplementary Materials.
Despite the loss of some of these key photosynthesis genes, there is still a substantial number of genes left encoding common plastid metabolic pathways as known from photosynthetic diatoms, including the generation of adenosine 5′-triphosphate (ATP) by adenosine triphosphatase (ATPase) subunits, which are encoded both in the nuclear and plastid genomes (24). Almost all genes encoding for plastid enzymes to synthesize essential amino acids are still encoded in the nuclear genome of N. putrida. Furthermore, all genes of the heme pathway have been found, and N. putrida appears to be able to synthesize riboflavin. The presence of plastid-targeted transporters (25) enables the transport of phosphoenolpyruvate, 3-phosphoglycerate, and dihydroxyacetone-phosphate across the plastid membranes. In addition, our genome-based reconstruction of plastid metabolism identified the biosynthesis pathway for lipids and the ornithine cycle in N. putrida ( Fig. 1D and fig. S2). The latter has been reported neither in previous transcriptome-based studies with this species (26) nor in any other secondary heterotroph ([.g., (6,7)]. As N. putrida has an osmotrophic lifestyle, it relies on dissolved organic matter and nutrients. Thus, the biosynthesis of a variety of metabolic compounds supports the osmotrophic lifestyle, lacking abilities to prey or to parasitize on other organisms.

Communication between organelles and light-dependent gene expression
The lack of CO 2 fixation in plastids of N. putrida-which reduces the amount of amino acids, lipids, and other metabolites to be synthesized-appears to be partially compensated for by the remodeling of metabolic interactions with mitochondria and peroxisomes and by retaining active recycling of nitrogen (Figs. 1 and 2 and figs. S3 and S4). It appears that the nonphotosynthetic plastid of N. putrida still exchanges glutamine and ornithine, both of which are important intermediates of the ornithine cycle. All genes for the ornithine-urea cycle have been retained in the N. putrida genome. The ornithineurea cycle is indispensable for nitrogen recycling in photosynthetic diatoms (27,28), and even after the loss of photosynthesis, nitrogen recycling appears to be essential in N. putrida ( Fig. 2A) due to its osmotrophic lifestyle. Usually, the ornithine-urea cycle is tightly linked with tricarboxylic acid cycle and/or photorespiration in photosynthetic diatoms (27,28). However, N. putrida is not likely to perform photorespiration ( Fig. 2A). The metabolic exchange with the peroxisome through glycolate likely has ceased as phosphoglycolate phosphatase and peroxisomal glycolate oxidase are missing. Thus, photorespiration is unlikely to take place in nonphotosynthetic plastids of N. putrid due to the lack of ribulose 1,5-bisphosphate carboxylase/oxygenase and other key enzymes of the Calvin cycle (Fig. 1). Nevertheless, peroxisomes still appear to play a role in N. putrida for the production of malate or glyoxylate, which feed into respiratory pathways of the mitochondria to support ATP and NADPH (reduced form of nicotinamide adenine dinucleotide phosphate) production ( Fig. 2A).
Light in photosynthetic organisms not only plays a substantial role for photosynthesis generating ATP and NADPH but also regulates cell division, diel cycles, and different signaling processes unlike in many heterotrophic organisms (29)(30)(31). As a consequence, we identified the remaining photoreceptors and cell-cycle regulators such as cyclins and cyclin-dependent kinases (32). Although they were still encoded and expressed in the genome of N. putrida ( Fig. 2B  and fig. S5, A and B), we were unable to identify a diel cycle in cell division (Fig. 2C). This suggests that these cel-cycle regulators potentially have neo/subfunctionalized and therefore have a different regulatory role in N. putrida unrelated to the diel cycle. The loss of the transcription factor bHLH-1a (RITMO1), which has been identified as a master regulator of diel periodicity (33), corroborates our finding that N. putrida has lost the ability to perform diel cycles. In addition, most of the other photoreceptors known from photosynthetic diatoms have also been lost (Fig. 2B) such as the blue light sensing aureochromes 1a/b, both of which are transcription factors responsible for photoacclimation (34). Despite the lack of lightdependent cell-cycle regulation, a few remaining photoreceptors were identified including bHLH1b_PAS, aureochrome 1c, and cryptochrome-DASH/CPF2 (Fig. 2B) (29,35). Basic ZIP [basic leucine zipper proteins (bZIP)] transcription factors having potentially light-sensitive Per-Arnt-Sim (PAS) domains (bZIP-PAS) (36) were also identified in the N. putrida genome such as homologs to bZIP6 and bZIP7 of Phaeodactylum tricornutum (37). The latter homolog has been duplicated and diversified in N. putrida ( fig. S5C). The presence of bZIP-PAS protein in a heterotrophic eukaryote is not unprecedented as some oomycetes, nonphotosynthetic parasites, have been reported to also encode them in their genomes [e.g., (38)]. Although their role in regulating gene expression remains to be investigated in N. putrida, light still appears to influence the expression of some genes in this heterotrophic species. Comparative transcriptome analyses every 4 hours during a shift from a light phase to darkness (Fig. 2D) revealed eight clusters characterized by different expression patterns. Furthermore, there was no cluster explicitly representing the light-dependent gene expression patterns as seen in photosynthetic algae [e.g., (29,39)]. However, one of the clusters contained genes only expressed in the mid-light phase: cluster 7 containing 90 genes (0.6% total). Forty four of them were genes with known functional domains based on a KOG (EuKaryotic Orthologous Groups) analysis, and 21 of them were encoding proteins for substrate import and carbon metabolism ( fig. S5D). However, the photoreceptor homologs above-bHLH1b_PAS, aureochrome 1c, and cryptochrome-DASH/CPF2-were not part of this cluster, and there was no explicit trend in their gene expression patterns with respect to changes between light and dark conditions.

The genetic toolkit for the evolution of nonparasitic secondary heterotrophy
Despite the loss of many nuclear genes and their families, the genome size of N. putrida is not significantly different to photosynthetic relatives such as F. cylindrus and P. tricornutum and the more distantly related diatom Thalassiosira pseudonana (table S1). This is distinct from evolutionary trends observed in parasitic eukaryotes that have lost photosynthesis as they have smaller genomes encoding smaller gene families compared to their photosynthetic relatives ( fig. S6). By comparing KOGs of paralogous proteins, there was no significant difference in the number of unique KOG IDs between these four diatom species ( fig. S7, A and B). However, when we compared the number of paralogous proteins assigned to each KOG ID, there were several KOG categories for which N. putrida had a higher number of paralogous proteins compared to the other diatom species: nucleotide transport (F), transcription (K), signal transduction (T), intracellular trafficking, secretion, vesicular transport (U), and cytoskeleton (Z) ( fig. S7C). Even after normalization by total gene numbers, nucleotide transport (F), signal transduction (T), Cell cycle   A microbial heterotroph acquires nutrients either by phagotrophy, the preferred nutrition of many parasites, or by osmotrophy. The latter requires uptake of dissolved organic compounds by osmosis as realized by bacteria and fungi, for instance (40,41). As N. putrida grows well under axenic conditions (5,42), it is likely an osmotroph, dependent on the uptake of dissolved organic compounds across the silicified cell wall and the plasma membrane. As realized by osmotrophic fungi, N. putrida may even be able to degrade higher-molecular weight compounds extracellularly to be subsequently taken up as individual molecules by specific transporters or even osmosis (40,41). Thus, it is likely that cell wall, membrane, and secreted proteins were diversified in N. putrida compared to photosynthetic diatoms to facilitate osmotrophy.
To address this hypothesis, we analyzed the enrichment of paralog proteins and differences in nutrient transporters involved in the uptake of dissolved organic compounds such as solute carriers. A comparison to photosynthetic diatoms and parasitic nonphotosynthetic algal species [Prototheca and Helicosporidium (green algae), and the apicomplexans Plasmodium and Toxoplasma] has revealed that N. putrida has a unique composition of genes encoding transporters, which is therefore different to photosynthetic algae and parasitic nonphotosynthetic algal species (Fig. 3, A to C). For instance, the number of genes encoding silicon transporters (SITs), solute symporters, and the resistance-nodulation-cell division superfamily was more than twice as abundant in N. putrida compared to photosynthetic diatom species (Fig. 3D and fig. S9A). However, in contrast to the difference between N. putrida and photosynthetic diatom species, there is no enrichment of particular transporters in parasitic algal species when compared to their photosynthetic relatives ( fig. S9, B and C).
Expansion of those gene families may, at least partly, have been achieved by recent tandem duplications (Fig. 3E). To gain insight into when the expansion had occurred, we performed a coalescence analysis, which revealed that SITs in N. putrida began to expand around 3.3 million years (Ma) ago [1.2 to 6.6, 95% confidence interval (CI)], while divergence from another nonphotosynthetic diatom N. alba is estimated to have occurred around 6.67 Ma ago (2.5 to 11.5, 95% CI; fig. S10). The split between F. cylindrus and P. multiseries, which was used to date the tree, was estimated at 9.7 Ma ago (7.6 to 11.6, 95% CI).
Thus, the recent expansion of SITs suggests neo/subfunctionalization of the gene family in response to the change in lifestyle. The divergence rate of SIT genes was much larger than that of control genes (e.g., myosin), indicating that SIT diversification might have contributed to the adaptation of the heterotrophic lifestyle. In support of this hypothesis, we detected several sites under positive selection in different members of the SIT family (table S2), which implies that the evolution of those genes may have been driven by diversifying selection.
The solute sodium symporters are estimated to have diverged around 7.5 Ma ago (3.8 to 11.1, 95% CI), markedly earlier than the SIT gene family. Although the divergence rate is also larger than that of control genes ( fig. S10), we did not find evidence of diversifying selection in this gene family. The differences between these two families of transporters suggest that their expansion might have occurred in a stepwise manner and driven by different evolutionary forces.
Furthermore, although the overall carbohydrate-active enzyme (CAZyme) family composition of Nitzschia was not different from that of photosynthetic diatoms ( fig. S11), families encoding -glycoside hydrolase (GH8), laminarinase (GH16_3), pectinase (GH28), -glucanase (GH72), -mannan hydrolyzing enzymes (GH99), and -1,2-glucan hydrolytic enzymes (GH114) were enriched in N. putrida compared to photosynthetic species (Fig. 3F). Expansion of these families might, at least partly, have been achieved by recent tandem duplications (Fig. 3G), suggesting an important role of these genes for the heterotrophic lifestyle of N. putrida. Notably, more than one-third of proteins assigned to the above six CAZyme families are predicted to be secreted in N. putrida (see below). The CAZyme compositions suggest that N. putrida might be able to degrade extracellular polysaccharides such as ß-1,3 glucans (e.g., lichenin, paramylon, callose, and laminarin), starches, -1,2-glucans, pectin, and -mannan. As N. putrida has been isolated from disintegrating mangrove leaves in a paddle (5,42), this species might play a role in degrading dead leaves and therefore facilitating carbon recycling in mangroves. To gain first insight into how transcription of CAZyme genes is regulated by different carbon sources, we performed comparative transcriptome analyses with starved N. putrida cells in comparison to cells growing on glucose and starch. However, we found that only a limited number of genes encoding CAZymes were differentially expressed (table S3). About half of these genes were upregulated in response specifically to starch as a carbon source, while only one CAZyme gene was up-regulated in response to glucose (table S3). This observation suggests that most of the CAZymes in N. putrida are not for the utilization of glucose and only very few for starch utilization. Arguably, providing a very limited set of organic substrates does not reflect the complexity of organic carbon provided by disintegrating leaves in a mangrove ecosystem. Hence, this might be the main reason for the limited transcriptome response observed in our experiments.
The predicted secretome of the nonparasitic, free-living secondary heterotroph N. putrida Given that the secretome plays an important role for substrate degradation and subsequent uptake of low-molecular weight compounds in osmotrophs (40), we conducted a comparative analysis to predict secreted proteins of N. putrida in silico by identifying proteins with N-terminal signal peptides and a lack of transmembrane domains. The resulting proteins were clustered using TribeMCL (43), and plastid-and lysosome-localized proteins were subsequently removed using ASAFind according to their characteristic targeting motifs (22) and Pfam domains. The number of putatively secreted proteins is 978, 998, 596, and 718 in N. putrida, F. cylindrus, P. tricornutum, and T. pseudonana, respectively, which corresponds to between 5 and 7% of the total number of genes in their genomes ( fig. S12A). Nevertheless, there were significant differences when we compared the diversity of proteins between these four diatom species (Fig. 4, A and B); N. putrida, on average, had a significantly higher number of proteins per tribe than any of the other diatom species (two-sided Wilcoxon signed-rank test; P < 0.01; Fig. 4C). In particular, proteins involved in heterotrophy such as organic matter degradation/modification including CAZymes and peptidases were more abundant in N. putrida than in the photosynthetic diatom genomes (188 in N. putrida, 142

r a g il a r io p s is P h a e o d a c ty lu m T h a la s s io s ir a P la s m o d iu m T o x o p la s m a V it r e ll a C h r o m e r a H e li c o s p o r id iu m P r o to th e c a A u x e n o c h lo r e ll a C h lo r e ll a C o c c o m y x a
The       T. pseudonana; fig. S12A). This is in contrast to parasitic green algae because their predicted secretomes are smaller than those of their photosynthetic relatives and show no explicit enrichment of secretome proteins per tribe (Fig. 4, D to F).

in
The most common secreted proteins in N. putrida are LRRcontaining proteins ( fig. S12B), many of which contain additional domains such as tegument and glycoprotein domains, suggesting an increased functional diversity (fig. S13). Only very few LRR-containing proteins were identified in the predicted secretomes of the photosynthetic diatoms, indicating that signal peptide-dependent secretion of abundant and diverse LRR-containing proteins may be an essential requirement in this secondary heterotroph, such as for environmental signaling (44). In addition to LRR-containing proteins, the top 10 most enriched proteins in N. putrida were von Willebrand factor type D (VWFD) proteins involved in adhesion or clotting, two types of endopeptidases, trypsin and leishmanolysin (cell-surface peptidase of the human parasite Leishmania), intradiol ring-cleavage dioxygenase protein for degradation of aromatic compounds, methyltransferase, and four proteins with unknown function ( fig. S12B). LRR-containing proteins and VWFDs might play important roles in N. putrida for attaching to disintegrating mangrove leaves (5,42,45). Endopeptidases and aromatic compound degradation may facilitate the utilization of their complex carbon compounds.
Furthermore, transcriptional dynamics of the predicted secretome over a diel cycle (Fig. 2) revealed the presence of four different clusters. Genes in cluster 1 were transcribed at the beginning of the first light phase and genes in cluster 2 at the end of the dark phase and into the second light phase (Fig. 4G). Genes of cluster 3 were most strongly expressed in the middle and end of the first light phase, whereas genes in cluster 4 were relatively weakly expressed throughout day and night. These results suggest that stimuli including light and/or nutrients play a role in the regulation of these genes, which might either be a relict from the photosynthetic ancestor or a response to diel cycles of organic substances in the aquatic system occupied by N. putrida. There is only weak evidence of lateral transfer of secretome genes in N. putrida (figs. S14 and S15) with five genes of potential lateral origin (figs. S14 and S15). Thus, the origin of most secretome proteins in N. putrida likely was derived vertically from homologs of a photosynthetic ancestor.

DISCUSSION
N. putrida experienced a series of genetic adaptations toward a heterotrophic lifestyle. This diatom species took a step backward in one of the major evolutionary transitions, from photoautotrophs to heterotrophs, potentially relaxing selection on some of the now redundant gene networks and their functions. As expected, more than 50% of nuclear encoded plastid proteins have been lost in the N. putrida plastid proteome in comparison to its photosynthetic counterparts (22). However, the total number of genes (~15,000) fell within the range of photosynthetic microalgae, and we found no evidence of pseudogene formation, genome streamlining [e.g., (46)], gene family contraction (cf. birth-and-death hypothesis) (47), or reductive genome evolution (Black Queen hypothesis) (48). The relatively large genome size is not unexpected given that N. putrida is a free-living osmotroph. This free-living lifestyle in a complex and highly variable coastal marine environment likely is the reason why a substantial number of genes including some photoreceptors, cell cycle regulators, and common plastid metabolic pathways usually present in photosynthetic diatoms have remained. Although some of the latter genes were still expressed, N. putrida appears to lack a diel growth cycle, which suggests that these cell-cycle regulators have neo/subfunctionalized. However, as a certain number of genes still appear to be regulated by light, osmotrophy potentially benefits from diel fluctuations of resources such as dissolved organic carbon in aquatic environments (49)(50)(51). For photoautotrophs, it is important to regulate the cell cycle in accordance with diel cycles for optimizing photosynthesis and therefore cell proliferation (29-31, 52, 53). Without being reliant on light as its primary energy source, the osmotroph N. putrida no longer requires coordinating its cell cycle with diel cycles. Thus, after the loss of photosynthesis, the strict light-dependent regulation of gene expression might have become less important and gene expression therefore may have become predominantly regulated by other stimuli. Many photoreceptors are missing, but duplication of genes for bZIP transcription factors with PAS domains and genes for signal transduction and cellular regulatory roles such as adenyl/guanyl cyclase and cyclic nucleotide esterase domains was enriched in the N. putrida genome. Furthermore, the peroxisome-plastid interaction is no longer required after the loss of photosynthesis, giving rise to loss of carbon fixation in the context of glycolate recycling. In contrast, the ornithine-urea cycle likely remains to be functional to facilitate nitrogen recycling.
Gene family expansions and neo/subfunctionalizations appear to have played a prominent role in the adaptation to its different lifestyle given that many proteins predicted to be secreted have diversified in N. putrida, possibly to facilitate osmotrophy. Together, the marked change of lifestyle associated with the "devolution" did not result in reductive genome evolution as known from nonphotosynthetic plastid-bearing parasites.

Genome assembly and construction of gene models
PacBio reads were assembled into contigs using Falcon (version 0.7.0) (18) with a length cutoff of 7000 bp for seed reads and an estimated genome size of 33 Mbp. Genome size estimation was performed on the GenomeScope web server (http://qb.cshl.edu/genomescope/) based on the k-mer frequency distribution of Illumina reads calculated by JellyFish version 2.2.6 with a k-mer size of 21. The resultant primary and associate contigs were then subjected to Falcon_unzip (version 0.5.0) (18), generating partially haplotype-phased contigs (primary contigs) and fully phased contigs (haplotigs). The assembly was polished using PacBio reads and Quiver program, followed by single-nucleotide polymorphism (SNP) and short insertion-deletion (indel) error correction using Pilon (version 1.2.2) with Illumina reads mapped by the Burrows-Wheeler Aligner (version 0.7.15) (19). Indel errors in the vicinity of hetero-SNPs were further fixed manually, as they were difficult to be automatically corrected. Contigs derived from plastid and mitochondrial genomes were identified using BLASTN and separated from contigs derived from the nuclear genome.
The gene models that overlapped with the results from BRAKER were removed using BlastP with evalue 1e-5 option, and the remaining gene models were merged with the BRAKER gene models to generate the final gene annotation. Transposable elements in the NIES-4239 genome were searched by RepeatMasker (version 4.9.0) using Dfam3.1 and RepBase-20170127 as reference repeat libraries (61). The predicted gene set was available in Dryad (https://doi. org/10.5061/dryad.j3tx95xft).
The integrity of gene annotation was assessed by BUSCO (version 3.0.2) (21) and the Eukaryota odb9 (version 2) dataset. The manipulation of SAM/BAM file was used by SAMtools (version 1.9). The sequence files of gene region from gff file were used by GffRead (version 0.9.11) (62).
Organellar genome annotation was performed by comparison with previously sequenced organellar genomes of nonphotosynthetic diatoms (24). Gene sets and their arrangements of the plastid and mitochondrial genomes sequenced in this study were found to be identical to previously sequenced nonphotosynthetic diatoms (24). Assembled genomes were deposited to DNA Data Bank of Japan (http://getentry.ddbj.nig.ac.jp/) under the accession numbers BLYE01000001 to BLYE01000234 for the nuclear genome, LC600866 for the mitochondrial genome, and LC600867 for the plastid genome.

Functional annotation
The predicted protein coding genes were annotated using InterProScan, and RPS-BLAST search was performed against KEGG orthology database (63,64). KO identifiers for Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways were assigned using KEGG Automatic Annotation Server (65). Transporter proteins were annotated with TransportTP (66) followed by manual curation. Reference proteome datasets for three photosynthetic diatom species were obtained from the JGI Genome Portal: P. tricornutum CCAP 1055/1 v2.0 (Phatr2_bd_unmapped_GeneModels_FilteredModels1_ aa.fasta and Phatr2_chromosomes_geneModels_FilteredModels2_ aa.fasta, 10,402 protein sequences in total), T. pseudonana CCMP 1335 (Thaps3_bd_unmapped_GeneModels_FilteredModels1_aa.fasta and Thaps3_chromosomes_geneModels_FilteredModels2_aa.fasta, 11,776 sequences), and F. cylindrus CCMP 1102 (Fracy1_GeneModels_ FilteredModels3_aa.fasta, 21,066 sequences). KEGG and KOG annotation was performed with them in the same manner as NIES-4239. Other details for annotation of CAZyme, cyclins, cyclin-dependent kinases, bZIP transcription factors, photoreceptor proteins, mitochondrial proteins, plastid proteins, and secretome proteins are described in the Supplementary Materials. Evolutionary analyses, comparative transcriptome analyses under the 12-hour light and 12-hour dark condition, those in different carbon sources, and biochemical experiments for lipids, fatty acids, and quinones are also described in the Supplementary Materials. Transcriptome data obtained in this study were deposited to DNA Data Bank of Japan (https://ddbj.nig.ac.jp/resource/bioproject/PRJDB11016 and https://ddbj.nig.ac.jp/resource/bioproject/PRJDB12553).

SUPPLEMENTARY MATERIALS
Supplementary material for this article is available at https://science.org/doi/10.1126/ sciadv.abi5075 View/request a protocol for this paper from Bio-protocol.